Skip to content

feature: zombie process reaper#134

Open
strebitz wants to merge 1 commit intoupbound:mainfrom
nephosolutions:feature/zombie-process-reaper
Open

feature: zombie process reaper#134
strebitz wants to merge 1 commit intoupbound:mainfrom
nephosolutions:feature/zombie-process-reaper

Conversation

@strebitz
Copy link
Copy Markdown

@strebitz strebitz commented Apr 1, 2026

The zombie processes are created because crossplane-opentofu-provider (PID 1) spawns child processes (git, called internally by tofu init -from-module=<git url>) but never calls Wait() on them after they exit.

When a process exits in Linux, it stays in a defunct (zombie) state until its parent calls wait() / waitpid() to reap it. Normally init (PID 1) or systemd does this automatically for any orphaned children re-parented to PID 1. Since crossplane-opentofu-provider is PID 1 in the container but has no zombie-reaping logic, every finished child process that gets re-parented to it (e.g. git sub-processes spawned by tofu) stays as a zombie forever.

Description of your changes

The fix is to make crossplane-opentofu-provider a subreaper using prctl(PR_SET_CHILD_SUBREAPER, 1) on Linux, and then periodically (or in a background goroutine) call syscall.Wait4(-1, ...) to reap any zombie children that have been re-parented to it.

Fixes #74

I have:

  • Run make reviewable to ensure this PR is ready for review.

How has this code been tested

Test File What it validates
TestReapAllNoChildren reaper_test.go reapAll returns immediately (doesn't block) when there are no children ? the WNOHANG + exit condition pid <= 0
TestReapAllDoesNotPanic reaper_test.go Repeated calls to reapAll with no children never panic
TestSetSubreaper reaper_linux_test.go prctl(PR_SET_CHILD_SUBREAPER, 1) succeeds without error
TestSetSubreaperIdempotent reaper_linux_test.go Calling setSubreaper multiple times is safe (kernel allows re-setting the flag)
TestReapAllReapsDirectChild reaper_linux_test.go A single zombie child (verified via /proc/<pid>/status state Z) is removed from the process table after reapAll
TestReapAllReapsMultipleChildren reaper_linux_test.go The internal reapAll loop drains all pending zombies in one invocation
TestReapAllAfterSIGKILL reaper_linux_test.go A SIGKILL'd child (non-zero exit) is also correctly reaped
TestStartReapsChildAfterExit reaper_linux_test.go The full Start() path ? SIGCHLD handler + background goroutine ? automatically reaps a child without any explicit Wait call

The cross-platform tests carry no build tag so they run everywhere. The Linux integration tests are gated with //go:build linux because they depend on /proc, SIGCHLD behaviour, and Wait4 semantics that only exist on Linux; which is the only platform where the container runs anyway.

@Upbound-CLA
Copy link
Copy Markdown

Upbound-CLA commented Apr 1, 2026

CLA assistant check
All committers have signed the CLA.

Add a background goroutine that continuously reaps any
children that get re-parented to the
crossplane-opentofu-provider process.

Signed-off-by: Sebastian Trebitz <sebastian@nephosolutions.com>
@strebitz strebitz force-pushed the feature/zombie-process-reaper branch from 0267109 to b946474 Compare April 1, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provider OpenToFu creating zombie git processes

2 participants